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Abstract — Unequal transition probabilities between input and 
output symbols, input power constraints, or input symbols of 
unequal durations can lead to non-uniform capacity achieving 
input distributions for communication channels. Using uniform 
input distributions reduces the achievable rate, which is called the 
shaping gap. Gallager's idea for reliable communication with zero 
shaping gap is to do encoding, matching, and jointly decoding and 
dematching. In this work, a scheme is proposed that consists in 
matching, encoding, decoding, and dematching. Only matching is 
channel specific whereas coding is not. Thus off-the-shelf LDPC 
codes can be applied. Analytical formulas for shaping and coding 
gap of the proposed scheme are derived and it is shown that the 
shaping gap can be made zero. Numerical results show that the 
proposed scheme allows to operate off-the-shelf LDPC codes with 
zero shaping gap and a coding gap that is unchanged compared 
to uniform transmission. 

I. Introduction 

The ultimate rate for reliable communication over a noisy 
channel is given by the maximum mutual information between 
input and output. A channel is called non-uniform if the input 
distribution that achieves this maximum is non-uniform. The 
non-uniformity can result from different factors, examples 
are channels that are not symmetric [1, Theorem 8.2.1], 
average power constraints on input symbols [2, Sec. IV], input 
symbols of unequal duration [3], and multi-user channels with 
cross-talk [4]. Conventional communication systems restrict 
the channel input to be uniformly distributed. The resulting 
penalty in terms of achievable rates is called the shaping gap. 

One approach to close the shaping gap goes back to Gallager 
[5, p. 208] and is nicely explained by McEliece in [6, Sec. 5]. 
The idea is to first encode the data and then to use a fixed-to- 
fixed length mapper from the codewords to the channel sym- 
bols. The mapping is chosen such that the capacity achieving 
probability mass function (pmf) is approximated. To achieve 
this, the mapping is many-to-one. For iterative decoding, a 
probabilistic demapper needs to be incorporated into the factor 
graph of the decoder [6, Fig. 8]. This idea was extended to 
non-binary low-density parity-check (LDPC) codes by Ratzer 
and MacKay in [7] and by Bennatan and Burshtein in [8]. 

Franceschini et al. conjectured in [9] that LDPC codes have 
universal properties, which basically means the following: if 
an LDPC code is designed for some channel A, it can be used 
for a different channel B, as long as the mutual information 
between input and output are the same both for channel A 
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and channel B. See [10] and references therein for analytical 
support of this conjecture. 

In this work, we use the conjecture of Franceschini et al. 
as a design paradigm. That is, instead of designing LDPC 
codes for specific non-uniform channels, we show how existent 
LDPC encoders/decoders that were originally designed for 
uniform channels can be operated with zero shaping gap on 
non-uniform channels. 

We propose the following transmission scheme. First, the 
binary equiprobable data stream is matched to the channel by 
a prefix-free code [11]. Then, the matched bits are encoded 
by a systematic code. The check bits are iterative ly matched 
to the channel and encoded such that all transmitted bits are 
matched to the channel. At the receiver side, the packets are 
first decoded and then dematched. The central property of 
our scheme is that shaping (matcher) and error correction 
(encoder/decoder) can be designed independently, in the sense 
that the matcher only influences the parameters passed to the 
encoder (i.e., data symbols) and decoder (i.e., log-likelihood 
ratios (LLR) and priors). This is a fundamental difference 
to Gallager's scheme, where decoding and dematching has 
to be performed jointly. For binary channels with unequal 
symbol durations, we provide analytical formulas for the 
shaping gain and the coding gain of our scheme. We show that 
capacity achieving matching codes can efficiently be found 
using Geometric Huffman Coding [11]. Our results can easily 
be extended to arbitrary discrete input memoryless channels. 
Finally, we apply our scheme to a binary symmetric channel 
(BSC) with symbol durations u> = 1 and w\ = 5 and varying 
error probability e. We use off-the-shelf LDPC codes, namely 
the codes used in DVB-S2 as implemented in the Matlab Com- 
munications Toolbox. For uniform transmission, we observe 
a shaping gap of 20% while for matched transmission, the 
shaping gap is virtually zero. For both schemes, the coding 
gap is almost equal, which is in perfect accordance with 
Francheschini's conjecture. 

The remainder of this paper is organized as follows. In 
Section II, we define the binary channel with unequal symbol 
durations. In Section III, we develop the proposed scheme. 
Formulas for the shaping and coding gain are derived in Sec- 
tion IV. Capacity achieving matchers are defined in Section V. 
Finally, we provide numerical results in Section VI. 
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Fig. 1. Bootstrap scheme. The acronym ECC stands for error correction coding. 



II. Problem Statement 

A binary channel (BC) is specified by a matrix of transition 
probabilities (hji). An input PMF p relates to its correspond- 
ing output PMF r as 

n J \ h w /in ) \ Pi 

The mutual information between input and output is according 
to Gallager [12, Eq. (8.73)] given by 



(1) 



(2) 



We consider BCs where the input symbols and 1 have 
unequal durations w = (w , wi) T . According to [13, The- 
orem 2], the capacity of such a BC is given by 



C = max 
p 



w T p 



(3) 



(4) 



Jimbo and Kunisawa [3] proposed a variation of the Blahut- 
Arimoto algorithm [2], [14] to efficiently find the capacity 
achieving distribution p* that maximizes (3). We denote the 
uniform pmf by u = (0.5, 0.5) T . The shaping gain that results 
from using u instead of p* is given by 

l(u) 1 
w T u C 

In the case of equal symbol durations w = W\, this gain is 
at least 0.942 [15, p. 295] and there are little reasons not to 
use it. However, for w ^ w\, the gain can be arbitrary close 
to zero. This motivates us to seek for a transmission scheme 
that closes the shaping gap for arbitrary symbol durations w 
while allowing to use existent LDPC encoders/decoders. 

III. Bootstrap Scheme 

A. Prefix-Free Matcher Codes 

The digital interface between source and channel coding is 
a stream of independent, identically distributed (iid) equiprob- 
able bits. By parsing the stream by a full prefix-free code, 



a dyadic pmf can be generated [11]. For example, consider 
the set of binary input symbol blocks of length 2, namely 
{00,01, 10, 11}. Then the mapping 



1 1 y 00 01 01 
001 1 y 10 000 i y 11 



(5) 



generates the pmf (2 _1 , 2~ 2 , 2~ 3 , 2~ 3 ) over the set 
{00,01,10,11} when the stream of iid equiprobable 
bits is parsed by the prefix-free code {1,01,001,000}. We 
call a device that implements this procedure a matcher. We 
will see in Section V how capacity achieving matchers can be 
found efficiently. When matchers are used for noisy channels, 
a severe problem occurs: one single bit error can lead to a 
complete loss of a block, since the binary input and output 
streams are out of sync, e.g., suppose the data bits 101001 
were mapped by the matcher (5) to the block 000110 and 
then transmitted over the channel, and suppose 010110 was 
detected at the channel output. Then, 



101001 1 y 000110 
010110 h^OIOIOOI 



(6) 
(7) 



i.e., matcher input and dematcher output are of different 
length and aligning the first two bits leads to 5 bit errors 
in the overlapping strings. Error-correction based on matcher 
input and dematcher output needs the capability of correcting 
insertion and deletion errors, which is difficult [16]. 

B. Reverse Concatenation 

The above problem can be solved by interchanging the order 
of error-correction coding and matching. This approach is used 
for constrained systems by first applying constrained coding 
and then error correction coding, see [17] and references 
therein. Suppose we have applied the matcher and apply error 
correction coding to the matcher output. In LDPC codes, sums 
of bits modulo 2 are transmitted. As the number of summands 
grows, the resulting bit is uniformly distributed independent 
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Fig. 2. Building block of the bootstrap scheme. 

of the pmf of the summands, i.e., 

V = k © h © • • • © i m ~ U{0, 1}. 



(8) 



Thus, even if the bits i\, . . . , i m were matched to the channel, 
p is not. A first step to circumvent this problem is to use 
systematic codes. Here, the data bits are transmitted unchanged 
and the check bits are appended. Thus, the data bits can be 
shaped and the check bits cannot. This already gives an im- 
provement compared to uniform transmission. This approach 
was suggested under the name sparse-dense codes by Ratzer 
in [7]. However, this approach does not close the shaping gap, 
e.g., for 1/2-rate codes, only half of the transmitted bits are 
matched to the channel. 

C. Bootstrapping the Check Bits 

We now introduce the bootstrap scheme, which allows to 
match the check bits to the channel. Consider a systematic 
binary block code that generates M check bits per K data bits, 
i.e., a rate Kj (K + M)-code. At the transmitter side, B blocks 
are sequentially encoded and then transmitted in reverse order. 
For the first block, the matcher generates K matched bits from 
the equiprobable input bit stream. From these K matched bits, 
the encoder calculates M check bits. These check bits are 
approximately iid and equiprobable. They are fed back and 
appended to the bit stream. In the next round, the matcher 
again generates K matched bits from the bit stream. Since the 
M check bits from the first round were appended to the bit 
stream, they are now matched to the channel as part of the K 
matched bits of the second round. This procedure continuous 
until the last but second round. In the last round, a (M',K') 
sparse-dense code is applied. The matcher maps the M check 
bits from round B — 1 plus data bits to K' matched bits. 
The encoder calculates M' check bits from the K' matched 
bits and appends them (unmatched) to the K' matched bits. 
These M' check bits of the last round are the ur-bits from 
which the decoder will start to bootstrap all B blocks. All B 
blocks are transmitted in reverse order such that the decoder 
can immediately start decoding. See Fig. 1 for an illustration. 

IV. Shaping and Coding Gain 

The number M' of ur-bits is independent of the number 
of packets B. Thus, as B grows, the fraction of matched bits 
approaches 1. Therefore, we will focus in the rest of this work 
on the building block of the bootstrap scheme, which consists 
of the transmission of K matched bits over a channel and 
decoding conditioned on M perfectly known check bits. See 
Fig. 2 for an illustration. Note that the M perfectly known 
check bits are not for free: the M check bits of the next 
packet to be received are embedded in matched form in the K 



matched bits. We will take this into account when calculating 
the effective transmission rate. 

A. Shaping Gain 

Assume the matcher uses a variable-to-fc length code, i.e., 
blocks of k matched bits are jointly generated from the 
unmachted bits. Denote by p the resulting block pmf. Denote 
by v the lengths of the blocks, e.g., if block i consists of £ 
zeros and k — I ones, the length is Vi = £w + (k — l)wi. The 
mutual information rate is directly given by 



Up) = ^ 

v 1 p 

and the shaping gain is given by 

IbsO) 



(9) 



(10) 



B. Coding Gain 

We now calculate the effective transmission rate of the 
bootstrap scheme, i.e., how many information bits per block 
length are in the average transmitted over the channel. Denote 
by V.(p) the entropy in bits of p. The average information per 
matched bit is V.(p)/k. Thus, in the average, 

(Jjunmachted bits} 1 



jj {matched bits} «M 



= m. 



(11) 



The unmachted bits consist of the information bits and the M 
check bits of the previous block. Thus 

(^{information bits} + M)m = jj{matched bits} = K (12) 

K 

^{information bits} = M. (13) 

m 

The average length of a matched bit is v T p/k. Denote by 
c = Kj (K+M) the code rate of the applied code. Altogether, 
the effective transmission rate can now be written as 

— — M 



^{information bits} 



U{p) 

k 



M+K-K 
K 



~k~ 



H{p) 

k 



v 1 p 
~k~ 



Rbs(P, c). 



We define the coding gain by 

Rb S (p,c) 



H(P) 
k 



1 



Up) 

k 



(14) 
(15) 

(16) 
(17) 

(18) 



lbs(p) 

V. Capacity Achieving Matcher 

We now show that the shaping gap can indeed be made 
as small as desired by approximating the capacity achieving 
pmf p* with the matching code found via Geometric Huffman 
Coding (GHC) [11]. We show this by first calculating the 
penalty that results from using an input pmf different from 
p*, and second, by bounding this penalty. 



A. Using the Wrong PMF 

The capacity achieving pmf p* is efficiently found by the 
Algorithm of Jimbo and Kunisawa [3]. By [3, Lemma 2], 



^ hji log < Cwi, with equality if p* > 0. 

3 P 3 

Denote by p some pmf with the only restriction that 
0, whenever p* = 0. 



Pi 



Now, 



i 3 
i 3 



log — 



hjir* 



r 3 r 3 



+ ^2Pi^2 h 3i lo S- 



= Cw T p — D(r\\r* 



(19) 



(20) 



(21) 



(22) 



(23) 



(24) 



where equality in the last line follows from (20) and (19). 
Dividing by w T p yields 

J(p) _ _ D(r||r*) 



w T p 



w T p 



(25) 



Thus, using pmf p instead of the capacity achieving pmf p* 
results in a penalty of D ( r \\ r ) , 

B. Capacity Achieving Prefix-Free Matcher 

By [1, Eq. (4.45)], the KL-distance between the output pmfs 
r and r* is upper bounded by the KL-distance between the 
corresponding input pmfs, i.e., D(r||r*) < D(p||p*). Thus, 
the penalty is upper bounded by 

D(p\\p*) 



w T p 



(26) 



By jointly approximating the capacity achieving pmf of k 
consecutive input symbols via Ghc, by [11, Prop. 2], the 
penalty bound (26) goes to zero as block-length k goes to 
infinity. Consequently, the left-hand side of (25) approaches 
capacity. 

VI. Application to a BSC with w = (1, 5) T 

We now consider a binary symmetric channel (BSC) where 
the input symbols have the weights w(Q) — 1 and w(l) = 5. 
We vary the bit error probability e between 0.005 and 0.057. 
As LDPC codes, we use the DVB-S2 codes with rates 8/9, 
5/6, 4/5, and 3/4. The DVB-S2 codes are systematic and 
amenable for simulation, since an implementation is readily 
available in the Communications System Toolbox of Matlab. 
Because the simulation is very time-intensive, we evaluate the 
performance around block-error rates pb of 10~ 2 . For each 
value of e, capacity achieving pmf p* and capacity C are 
calculated using the Jimbo-Kunisawa Algorithm [3]. Both for 
uniform and matched transmission, we use exactly the same 
parity check matrices, encoders, and decoders, namely the 
unchanged implementation in Matlab. The differences between 
uniform and matched transmission are detailed next. 



00 M> 0000 010 M> 0001 011 i y 0010 10111 M> 0011 

100^ 0100 10110 >-> 0101 10100 i — ^ 0110 1010111 i — y 0111 

110 i — ^ 1000 11101 i — y 1001 11100 i — y 1010 101011 i — y 1011 

11110 i — y 1100 111110 i — y 1101 111111^1110 1010100^1111 

Fig. 3. Matching code found by Ghc. The same matching code was found 
for the whole considered range of the error probability e. 

A. Uniform Transmission 

1) Matching: K iid equiprobable data bits are generated. 
No matching is performed. 

2) Transmission: Both the K data bits and the M check 
bits are transmitted over a BSC with parameter e. 

3) Decoder parameters: Since we have uniform priors, we 
pass to the decoder for each of the K = K + M received bits 
the LLRs 

LLR(0) = In ^~ € \ if received bit y = 

LLR(l) = In — — e — -, if received bit y = 1. 

14 ~ e / 

4) Evaluation: For each e, the shaping gain is according to 
(4) calculated as 



l{u) 1 
w T u C 



(27) 



where u is the uniform pmf. For each coding rate c and 
channel parameter e, the coding gain calculates as 



c 



(28) 



B. Matched Transmission 



1) Matching: For the capacity achieving joint pmf p* 4 of 4 
subsequent bits, the dyadic approximation p is calculated using 
the implementation [18] of Ghc. The prefix-free matching 
code induced by p is displayed in Fig. 3. For the whole 
considered range of e, the optimal dyadic approximation 
remains unchanged. A sufficiently long data stream of iid 
equiprobable bits is generated. By parsing the stream by the 
matcher code, a sequence of K/4 blocks consisting each of 4 
bits is generated. These blocks are iid according to p. 

2) Transmission: We transmit the K matched data bits over 
a BSC with parameter e. The M check bits remain unchanged. 

3) Decoder: Since we have non-uniform priors p* = 
(7T , 7Ti) T , we pass to the decoder for the K received bits 

LLR(0) + In — = l n ^ Z flZEg ; if reC eived bit y = 

7Tl 67T1 

LLR(l) + In — = ln - — e7rc \ — , if received bit y = 1 

7Tl (1 - e)7Ti 

and for the M check bits, we pass 

oo, if check bit b = 
— oo, if check bit b = 1. 
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Fig. 4. Uniform transmission. 
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{coding, shaping} gain 

Fig. 5. Matched transmission. 



4) Evaluation: For each channel parameter e, the shaping 
gain is according to (10) calculated as 

Tip) 1 

"Pr (29) 

where v denotes the weights of the symbol blocks. For each 
coding rate c and channel parameter e, the coding gain is 
according to (18) calculated as 



4 

C. Discussion 

The simulation results are displayed in Fig. 4 and Fig. 5. 
Shaping and coding gain versus block error probability pb 
are displayed for 3/4, 4/5, 5/6, and 8/9 DVB-S2 codes. 
The different operating points were generated by varying the 
channel error probability e. For uniform transmission, the 
shaping gain is around 0.83. For matched transmission, the 
shaping gain is larger than 0.99. The coding gain is for 
uniform transmission between 0.92 and 0.95. For matched 
transmission, it is between 0.9 and 0.94. For example, for 
the rate 3/4 code, the coding gain for uniform transmission 
is 0.92 and for matched transmission, it is 0.9, thus slightly 
worse. However, for uniform transmission, the 3/4 code works 
at pb = 10~ 2 for some e between 0.028 and 0.0285 , while 
for matched transmission, the 3/4 code works at pb = 10 -2 
for e = 0.0575 , which is a significantly higher channel 
error probability. If we compare the achieved coding gains of 
uniform and matched transmissions at similar (pb, e) pairs, we 
observe a higher coding gain for matched transmission than for 
uniform transmission. For example, we have a coding gain of 
0.921 at (pi, — 0.03, e = 0.0285) for uniform transmission and 
a coding gain of more than 0.921 at (p b = 0.008, e = 0.037). 
This corresponds to a better coding gain at a lower block error 
rate for a higher channel error probability. 
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